Day4

Uroš Godnov

Graphics

  • base: old system
  • lattice: creating function to plot
  • ggplot2

Base

  • example of a base graphics
index<-data.frame(year=2007:2016, pop=sample(10000:20000, size=10))

plot(index$year,index$pop)

Base - lines

  • type lines
#lines
plot(index$year,index$pop, type="l")

Base - histrogram

  • type histogram
plot(index$year,index$pop, type="h")

Base - title and axis labels

  • adding title and axis labels
plot(index$year,index$pop, type="l", main="Population by year",
     xlab="year",ylab="population")

Lab

  • use mtcars dataset and create a line plot where cyl must be on x axis and mean mpg on y axis

Lattice - creating a function

  • pop~year or mean mpg~cyl
lattice::dotplot(year~pop,data=index,main="Population by year",
        ylab="year",xlab="population")

Lattice - problem with y-values

  • we must change year values to factors
index$year<-as.factor(index$year)
lattice::dotplot(year~pop,data=index,main="Population by year",
        ylab="year",xlab="population")

Lattice - histrogram

histogram(year~pop,data=index)

Lattice - more complex formula

xyplot(mpg~wt | factor(cyl), data=mtcars, pch=19,
                main="MPG vs Wt", xlab="Wt/1,000",  ylab="MPG")

Lab

  • use mtcars dataset and create a barchart of mean mpg for number of gears per cyl

ggplot2

  • Foundation for many graphics applications:

    • ggplot2
    • Tableau
    • Vega-Lite

ggplot2 - The idea

ggplot2

ggplot2

  • data: data.frame
  • aes: mapping data to properties; color, format and size
  • stats: transforms input variables to display values
  • scales: translate between variable ranges and properties ranges - categories -> colours; numbers -> positions
  • geoms: geometric objects (points, lines, …)
  • facets: forms a matrix of panels defined by row and column faceting variable
  • coordinates: defines the phiysical mapping of the aesthetics to the paper
  • themes: every part of the graphic that is not linked to the data

ggplot2 - basic object

  • which data to plot
  • which columns to use for x and y
  • how to draw the plot
  • “+” is used to combine the ggplot2 elements
ggplot(data=faithful,
       mapping=aes(x=eruptions,
                   y=waiting))+
  geom_point()

ggplot2 - different syntax

ggplot(data=faithful)+
  geom_point(mapping=aes(x=eruptions,
                   y=waiting))

ggplot()+
  geom_point(data=faithful,mapping=aes(x=eruptions,
                   y=waiting))

ggplot2 - adding colour

  • we can create subgroups in ggplot2 with colour parameter
ggplot(data=faithful)+
  geom_point(mapping=aes(x=eruptions,y=waiting,
                         colour=eruptions<3))

ggplot2 - different geometry

  • some geoms only need a single mapping and will calculate the rest for you
ggplot(data=faithful)+
  geom_histogram(mapping=aes(x=eruptions))

ggplot2 - many layers

  • layers are stacked int the order of code appearance
ggplot(data=faithful,
       mapping=aes(x=eruptions,
                   y=waiting))+
  geom_density_2d()+
  geom_point()

Lab

  • open ggplot2.txt. Solve excercises 1-4!

ggplot2 - statistics - 1

  • geom_bar uses stat_count() by default
ggplot(mpg) + 
  geom_bar(aes(x = class))

ggplot2 - statistics - 2

  • you can pre-calculate
  • and use stat = ‘identity’
mpg_counted <- mpg %>% 
  count(class, name = 'count')
  
ggplot(mpg_counted) + 
  geom_bar(aes(x = class, 
              y = count), 
        stat = 'identity')

ggplot2 - statistics - 3

  • using after_stat (ggplot2 ver3.3.0)
  • modifying mapping from stats
ggplot(mpg) + 
  geom_bar(aes(x = class, 
  y = after_stat(100 * count / 
                  sum(count))))

Values calculated by the stat is available with the after_stat() function inside aes()

Lab

  • open ggplot2 and solve the 5th excercise
  • additional hint:…+ stat_summary(aes(x = class, y = hwy),fun=?, geom = “point”, color=?)

ggplot2 - scales - 1

ggplot(mpg) + 
  geom_point(
  aes(x = displ, 
      y = hwy, 
  colour = class))

  • scales define how the mapping you specify inside aes() should happen. All mappings have an associated scale even if not specified
  • based on the vector type of a class,a discrete colour is picked

ggplot2 - scales - 2

  • we can take control by adding one explicitly
  • RColorBrewer::display.brewer.all()
ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy, colour = class)) + 
  scale_colour_brewer(type = 'qual')

ggplot2 - scales - 3

  • positional mappings (x and y) also have associated scales.
ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  scale_x_continuous(breaks = c(3, 5, 6)) + 
  scale_y_continuous(trans = 'log10')

Lab

  • open ggplot2 and solve the 6-7 excercises

ggplot2 - facet

  • the facet defines how data is split among panels. The default facet (facet_null()) puts all the data in a single panel
  • facet_wrap() and facet_grid() allows you to specify different types of small multiples
  • mind the scales
ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class)

Lab

  • open ggplot2 and solve the 8-9 excercises
  • scales and space can have free_x, free_y and free values

ggplot2 - coordinates - 1

  • the coordinate system is the fabric you draw your layers on in the end
  • the default `coord_cartesion provides the standard rectangular x-y coordinate system
  • changing the coordinate system can have dramatic effects
ggplot(mpg) + 
  geom_bar(aes(x = class)) + 
  coord_polar()

ggplot2 - coordinates - 2

  • changing mapping
ggplot(mpg) + 
  geom_bar(aes(x = class)) + 
  coord_polar(theta = 'y') + 
  expand_limits(y = 70)

ggplot2 - coordinates - 3

  • zooming with scales removes data outside limits
ggplot(mpg) + 
  geom_bar(aes(x = class)) + 
  scale_y_continuous(limits = c(0, 40))

ggplot2 - coordinates - 4

  • zooming with coords creates proper zoom
ggplot(mpg) + 
  geom_bar(aes(x = class)) + 
  coord_cartesian(ylim = c(0, 40))

ggplot2 - coordinates - 5

  • scale vs. coordinate transformation
scale<-ggplot(diamonds, aes(carat, price)) +
  geom_point() +
  scale_x_log10() +
  scale_y_log10()
coord<-ggplot(diamonds, aes(carat, price)) +
  geom_point() +
  coord_trans(x = "log10", y = "log10")
(scale | coord)

Lab

  • open ggplot2 and solve the 10th excercise

ggplot2 - themes - 1

  • theming defines the feel and look of your final visualisation
  • it is very easy to change looks with a prebuild theme
  • few themes in ggplot2 package
  • for more install ggthemes package
ggplot(mpg) + 
  geom_bar(aes(y = class)) + 
  facet_wrap(~year) + 
  theme_minimal()

ggplot2 - themes - 2

    ggplot(mpg) + 
      geom_bar(aes(y = class)) + 
      facet_wrap(~year) + 
      labs(title = "Number of car models per class",
           caption = "source: http://fueleconomy.gov",
           x = NULL,
           y = NULL) +
      scale_x_continuous(expand = c(0, NA)) + 
      theme_minimal() + 
      theme(
        text = element_text('Avenir Next Condensed'),
        strip.text = element_text(face = 'bold', hjust = 0),
        plot.caption = element_text(face = 'italic'),
        panel.grid.major = element_line('white', size = 0.5),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.ontop = TRUE
      )

ggplot2 - themes - 3

ggplot extensions

ggplot extensions

link

ggplot GUI

  • package esquisse
  • generates code
  • copy+paste code

ggplot2 - ggrepel - 1

  • the problem of labels, they usually overlap
  • geom_text_repel()
  • geom_label_repel()
ggplot(mtcars, aes(x=wt, y=mpg))+geom_point()+
  geom_label(label= row.names(mtcars))

ggplot2 - ggrepel - 2

library(ggrepel)

ggplot(mtcars, aes(x=wt, y=mpg, label= row.names(mtcars)))+
  geom_point()+
  geom_label_repel()

Lab

Take dataframe iris. Create scatter plot for Sepal.Length and Sepal.Width and label points with Species. Make sure the labels don’t overlap.

ggplot2 - ggforce - 1

  • adding missing functions, i.e. facets
library(ggforce)
ggplot(mtcars, aes(hp, mpg, colour = as.character(cyl))) +
  geom_point()+
  facet_zoom(x = cyl == 6)+theme(legend.title = element_blank())

ggplot2 - extensions

  • plot composition:

    • gridExtra::grid.arange()
    • ggpubr::ggarange()
    • patchwork

ggplot2 - plot composition - 1

p1 <- ggplot(msleep) + 
  geom_boxplot(aes(x = sleep_total, y = vore, fill = vore))
p2 <- ggplot(msleep) + 
  geom_bar(aes(y = vore, fill = vore))
p3 <- ggplot(msleep) + 
  geom_point(aes(x = bodywt, y = sleep_total, colour = vore)) + 
  scale_x_log10()

ggplot2 - plot composition - 2

  • combining them with patchwork is a breeze using the different operators
(p1 | p2) / 
   p3

ggplot2 - plot composition - 3

  • combining them with patchwork is a breeze using the different operators
  • plot_layout(guides = ‘collect’) collects the legends
p_all <- (p1 | p2) / 
            p3
p_all + plot_layout(guides = 'collect')

Lab

Patchwork will assign the same amount of space to each plot by default, but this can be controlled with the widths and heights argument in plot_layout(). This can take a numeric vector giving their relative sizes (e.g. c(2, 1) will make the first plot twice as big as the second). Modify the code below so that the middle plot takes up half of the total space:

p <- ggplot(mtcars) + 
  geom_point(aes(x = disp, y = mpg))
p + p + p

ggplot2 - animation - 1

  • gganimate package
  • gganimate extends the API and grammar to describe animations
  • many different transitions that control how data is interpreted for animation
ggplot(economics) + 
  geom_line(aes(x = date, y = unemploy))

ggplot2 - animation - 2

  • transition_reveal: reveal data along a given dimension
ggplot(economics) + 
  geom_line(aes(x = date, y = unemploy)) + 
  transition_reveal(along = date)

ggplot2 - animation - 3

  • transition_states: transition between several distinct stages of the data
ggplot(mpg) + geom_bar(aes(x = factor(cyl))) + 
  labs(title = 'Number of cars in {closest_state} by number of cylinders') + 
  transition_states(states = year) + enter_grow() + 
  exit_fade()

Lab

In the animation below (as in all the other animations) the changes happens at constant speed. How values change during an animation is called easing and can be controlled using the ease_aes() function. Read the documentation for ease_aes() and experiment with different easings in the animation, i.e. ease_aes(“bounce-in-out”)

    mpg2 <- tidyr::pivot_longer(mpg, c(cty,hwy))
    ggplot(mpg2) + 
      geom_point(aes(x = displ, y = value)) + 
      ggtitle("{if (closest_state == 'cty') 'Efficiency in city' else 'Efficiency on highway'}") + 
      transition_states(name)